Approximate String Matching over Ziv
نویسندگان
چکیده
We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, speciically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn + R) time. The existence problem needs O(mkn) time. We also show that the algorithm can be adapted to run in O(k 2 n + min(mkn; m 2 (mm) k) + R) average time, where is the alphabet size. The experimental results show a speedup over the basic approach for moderate m and small k.
منابع مشابه
Approximate String Matching over Ziv - LempelCompressed
We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, speciically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn + R) ti...
متن کاملA General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is essential to have compressed text databases where eecient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts th...
متن کاملA Lossy Data Compression Based on String Matching: Preliminary Analysis and Suboptimal Algorithms
A practical suboptimal algorithm (source coding) for lossy (non-faithful) data compression is discussed. This scheme is based on an approximate string matching, and it naturally extends lossless (faithful) Lempel-Ziv data compression scheme. The construction of the algorithm is based on a careful probabilistic analysis of an approximate string matching problem that is of its own interest. This ...
متن کاملApproximate String Matching with Lempel-Ziv Compressed Indexes
A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T . Furthermore, the structure can reproduce any substring of T , thus it actually replaces T . Despite the explosion of interest on self-indexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper...
متن کامل